Information Discovery based on Multi-granularity Text Fusion
نویسندگان
چکیده
In this paper we introduce a new information discovery algorithm Multi-granularity Text Fusion (MGTF) on the Web. Granularity means the length of News relevant web documents, such as News web pages, Blog and Micro Blogs, which comes from web uses. The longer the text is, the higher of the granularity it has. Given a topic query on the Internet and the results of different granularity and time-stamped web documents which contain the query keywords, the task of MGTF is to orderly return those different granularity web documents discussed about the same topic. The process of multigranularity web documents analysis leads to heretofore unknown information and opinions that valuable potential, minority and contentious respectively, which integrates the time, content, reprint and link information. Experiments show that MGTF achieves the best overall performance with high effectiveness and robustness.
منابع مشابه
User Interests Modeling Based on Multi-source Personal Information Fusion and Semantic Reasoning
User interests are usually distributed in different systems on the Web. Traditional user interest modeling methods are not designed for integrating and analyzing interests from multiple sources, hence, they are not very effective for obtaining comparatively complete description of user interests in the distributed environment. In addition, previous studies concentrate on the text level analysis...
متن کاملA fusion approach for managing multi-granularity linguistic term sets in decision making
The aim of this paper is to present a fusion approach of multi-granularity linguistic information for managing information assessed in di erent linguistic term sets (multi-granularity linguistic term sets) together with its application in a decision making problem with multiple information sources, assuming that the linguistic performance values given to the alternatives by the di erent sources...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملShort Text Hashing Improved by Integrating Multi-granularity Topics and Tags
Due to computational and storage efficiencies of compact binary codes, hashing has been widely used for large-scale similarity search. Unfortunately, many existing hashing methods based on observed keyword features are not effective for short texts due to the sparseness and shortness. Recently, some researchers try to utilize latent topics of certain granularity to preserve semantic similarity ...
متن کاملFusion of Thermal Infrared and Visible Images Based on Multi-scale Transform and Sparse Representation
Due to the differences between the visible and thermal infrared images, combination of these two types of images is essential for better understanding the characteristics of targets and the environment. Thermal infrared images have most importance to distinguish targets from the background based on the radiation differences, which work well in all-weather and day/night conditions also in land s...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013